A Pilot PropBank Annotation for Quranic Arabic
نویسندگان
چکیده
The Quran is a significant religious text written in a unique literary style, close to very poetic language in nature. Accordingly it is significantly richer and more complex than the newswire style used in the previously released Arabic PropBank (Zaghouani et al., 2010; Diab et al., 2008). We present preliminary work on the creation of a unique Arabic proposition repository for Quranic Arabic. We annotate the semantic roles for the 50 most frequent verbs in the Quranic Arabic Dependency Treebank (QATB) (Dukes and Buckwalter 2010). The Quranic Arabic PropBank (QAPB) will be a unique new resource of its kind for the Arabic NLP research community as it will allow for interesting insights into the semantic use of classical Arabic, poetic literary Arabic, as well as significant religious texts. Moreover, on a pragmatic level QAPB will add approximately 810 new verbs to the existing Arabic PropBank (APB). In this pilot experiment, we leverage our knowledge and experience from our involvement in the APB project. All the QAPB annotations will be made freely available for research purposes.
منابع مشابه
Morphological Annotation of Quranic Arabic
The Quranic Arabic Corpus (http://corpus.quran.com) is an annotated linguistic resource with multiple layers of annotation including morphological segmentation, part-of-speech tagging, and syntactic analysis using dependency grammar. The motivation behind this work is to produce a resource that enables further analysis of the Quran, the 1,400 year old central religious text of Islam. This paper...
متن کاملSyntactic Annotation Guidelines for the Quranic Arabic Dependency Treebank
The Quranic Arabic Dependency Treebank (QADT) is part of the Quranic Arabic Corpus (http://corpus.quran.com), an online linguistic resource organized by the University of Leeds, and developed through online collaborative annotation. The website has become a popular study resource for Arabic and the Quran, and is now used by over 1,500 researchers and students daily. This paper presents the tree...
متن کاملThe Revised Arabic PropBank
The revised Arabic PropBank (APB) reflects a number of changes to the data and the process of PropBanking. Several changes stem from Treebank revisions. An automatic process was put in place to map existing annotation to the new trees. We have revised the original 493 Frame Files from the Pilot APB and added 1462 new files for a total of 1955 Frame Files with 2446 framesets. In addition to a he...
متن کاملSupervised collaboration for syntactic annotation of Quranic Arabic
The Quranic Arabic Corpus (http://corpus.quran.com) is a collaboratively constructed linguistic resource initiated at the University of Leeds, with multiple layers of annotation including part-of-speech tagging, morphological segmentation (Dukes & Habash, 2010) and syntactic analysis using dependency grammar (Dukes & Buckwalter, 2010). The motivation behind this work is to produce a resource th...
متن کاملA Pilot Arabic Propbank
In this paper, we present the details of creating a pilot Arabic proposition bank (Propbank). Propbanks exist for both English and Chinese. However the morphological and syntactic expression of linguistic phenomena in Arabic yield a very different type of process in creating an Arabic propbank. Hence, we highlight those characteristics of Arabic that make creating a propbank for the language a ...
متن کامل